BINARY TREE STRUCTURES FOR DATA BASE INDEXING OF DBF FILES

    The set of data structures, procedures, functions and associated code 
provided will allow the developer to maintain binary tree sorted list files 
of DBF files (DBASE III+ compatible) based on a Field Array of up to 10 DBase 
Field Descriptors for each index structure needed.  These are true binary 
trees that were taught to you in college DB Architecture class:

                                Root
                                /  \
                              /      \
                            Son     Daughter
                          / \           / \
                        /     \       /     \
           NULL----->...      ...    ...    Daughter
                                             /  \
                                           ...   ...

    And yes, this structure carries with it the same liability it always has;
the necessity of wasting space for NULL ADDRESS pointers which I don't con-
sider all that bad given the capabilities of the structure for very large
data bases where it's true efficiency (when balanced) and simplicity still 
rivals more modern, complex indexing schemes.

    Within these HDX index files is the capability, through code and data
structure development to maintain and control a MAINTENANCE STACK of unused
address space within the file using a somewhat hybrid linked list data struc-
ture whose size in bytes matches the size in bytes of the node structure for 
the tree.

    All this stack creation and maintenance is carried out in order to max-
imize and utilize file space released through record deletions and still 
maintain index integrity without having to rebuild the files after associated 
DBF files have been packed (which procedure does actually rebuild the file 
image by copying active records to a new file and skipping over those that 
are marked for deletion) thereby requiring a consolidation of the associated
index files.  This real-time data integrity of these index files also lends
itself quite well to the development of a NETWORKABLE Database and real-time
multi-user index structure (more on this later).

    In order to conserve space, all HDX index files have HEADERS, along with
the abovementioned node and stack elements and individual index headers (see
below).  These headers store the image of each root node for each index (yes, 
more than one index can reside within one HDX file) and each subsequent LChld 
(SON) or RChld (DAUGHTER) Node then holds it's own address as well as that of 
it's PARENT, Son & Daughter and the RECORD NUMBER where the index entry INFO 
is stored in the DBF file.  For purposes of clarity, which every programmer 
will tell you is the true measure of how good a person you are, let me say 
that it goes without saying that ADDRESS herein refers to the FILE OFFSET 
within the HDX file of these structures.

    Stack elements will be maintained in the HDX file in the same file header.
There the first record of this linked list is stored and all unused node/stack
addresses within the HDX file are maintained through this list for use in sub-
sequent insert operations.

    Comparisons within this structure are done "on the fly" and no overhead
for comparison text/info is made within the HDX file.  One of my frustrations
with Dbase NDX files was the amount of memory used to replicate information
already found within the database.  This duplication of information seemed
wasteful, yet apparently the prime reason for this method is to speed up any
searches by limiting disk accesses to a minimum.  This really isn't a problem
anymore and file space is; so, for now that's about as far as I'll go in 
explaining the rationale for my file scheme.  Except to say that I had a net-
work, multi-user environment in mind when I designed it and this way seemed 
the best for DB development where record locking/unlocking was my responsi-
bility and real-time index integrity accross a multi-user platform could be 
reasonably assured.

    The resulting HDX file, but for the header, is basically a "mish-mash" of 
information with no apparent structure when viewed under the harsh eyes of a 
diagnostic rationale and can be very volatile were a sudden power surge or
network failure to occur.  So as difficult as it already is to lay down the
basics of my design, I will stick to a single user environment which can 
still utilize this scheme and avoid talk of "networkability" as much as pos-
sible.

    Reindexing under this structure is relegated to HDX file recreations after
data integrity has been lost due to the aforementioned and any other computer 
disasters. Binary tree algorithm search speed will compensate for reads and
comparisons done "on the fly".

    Lastly, since computer disasters are inevitable and there exists the very 
real possibility of a "linear tendency" within a binary tree structure which 
will wreak havoc on any recursive procedure (did I mention I used recursion?), 
I have included a random number range generator which will create a "random" 
sequence of record numbers during insertion to insure balanced tree formation 
when reindexing a DBF file (near as I can tell, this generator fills up to 
90-95% of numbers in the range on one pass).  Any chance that records were 
entered in the order specified by an index necessitates this method of tree 
formation for existing DBF files to maximize speed and memory and eliminate
potential problems.  I won't go into much esoteric or academic detail regar-
ding binary trees and their implementation; so, if you are not familiar with 
them I strongly suggest you research the topic before I thoroughly muddy the 
waters with my implementation.

    All code, data structures and algorithms were designed in and for use in 
the Visual Basic for Dos programming environment and for BINARY file mode
implementation.

Don't you just hate the legal niceties of life?
_____________________________________________________________________________
This specification was developed by J. Lerma Jr. \dba\ InterActive Solutions.
Although it has been released into the public domain and is not confidential
or proprietary, the specification is still the copyright and property of 
J. Lerma Jr. & InterActive Solutions.

Disclaimer of Warranty

J. LERMA JR. & INTERACTIVE SOLUTIONS EXCLUDE ANY AND ALL IMPLIED WARRANTIES, 
INCLUDING WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. 
NEITHER J. LERMA JR. OR INTERACTIVE SOLUTIONS MAKE ANY WARRANTY OF REPRESENTA-
TION, EITHER EXPRESS OR IMPLIED, WITH RESPECT TO THIS SPECIFICATION, ITS QUA-
LITY, PERFORMANCE, MERCHANTABILITY, OR FITNESS FOR A PARTICULAR PURPOSE. 
NEITHER J. LERMA JR. OR INTERACTIVE SOLUTIONS SHALL HAVE ANY LIABILITY FOR 
SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES ARISING OUT OF OR RESULTING 
FROM THE USE OR MODIFICATION OF THIS SPECIFICATION.
_____________________________________________________________________________
Whew, glad that's over.

    All information on DBF file structures and code for their implementation
is credited to Ethan Winer and his book: 

                       "BASIC Techniques & Utilities"
                        PC Magazine / ZD Press
                        Emeryville, California
                        Copyright 1991


    If you don't already have this book, I recommend it highly. On with the
show.                                    

HDX File Header-------------------------------------------------------------
               Lngth in
Address        Bytes           Desc
----------------------------------------------------------------------------
   1             2             Header Flag, Reserved    
   3             2             Word for Integer Index Header Count 
   5           396             99 Dwords, addresses for Index Headers(*)
 401           568             142 Dwords, Reserved 
 969             4             Dword, Reserved for HDX signature: Long Int 969
 973            20             First Stk Element
 993             4             Dword, Reserved
 997             4             Dword, Long Int Length of File
----------------------------------------------------------------------------
File Offset:  1000 Bytes for HDX Header     
          (*)  100 Bytes for Index Header -- NDXVOL  (see below)
                20 Bytes for Node Record  -- NDXNODE (see below)
                20 Bytes for Stack Record -- STKTYPE (see below)


    The file header takes up the first 1000 bytes of the HDX file and is where
all index requests begin. For my purposes, I have limited the individual index
count to 99 within an HDX file (changing this would alter the header size, the
signature Dword position and in general wreak havoc with this scheme).  The 
last four bytes of the header contain the HDX file length which is updated
after each insert request.

    Subsequent data structures to be found in the file beginning at position 
    1001 include:

TYPE NdxVol                     TYPE NdxNode                TYPE StkType
 Flag   AS INTEGER               RecNo AS LONG               Addr1 AS LONG   
 TName  AS STRING * 8            Prnt  AS LONG               Addr2 AS LONG   
 DName  AS STRING * 8            LChld AS LONG               Addr3 AS LONG   
 HdrLng AS INTEGER               RChld AS LONG               Nxt   AS LONG   
 RecLng AS INTEGER               Addr  AS LONG               Addr  AS LONG   
 Unq    AS INTEGER              END TYPE                    END TYPE    
 TRecs  AS LONG
 Root   AS NdxNode
 FldCnt AS INTEGER
 KeyFld(0 TO 9) AS FCBlk (See below)
END TYPE

    We begin with NdxVol.
------------------------
    The upto 99 Index headers can occur anywhere inside the HDX file, allowing
for new index creation for an HDX file already in use.  This necessitates the
allocation of space in the header to store their addresses within the file as
well as a count of Indexes in same.

    Flag   - for network use only. 
    TName  - an 8 character unique name given to the index for identification 
             purposes.
    DName  - an 8 character name taken from the DBF file (i.e. Customer.dbf
             would require a DName of "Customer" for the Index Header).
    HdrLng - DBF file Header length (See below)
    RecLng - DBF file Record length (See below)
    Unq    - Integer flag for Index's "unique" elements status: True\False
    TRecs  - Total number of records in DBF file (See below)
    Root   - See NdxNode
    FldCnt - Count of number of Key fields to index by
    KeyFld - Data structure holding Key Field information (See DBF structures)
             for the 10 Key fields:
             TYPE FCBlk
              FNum AS INTEGER           -  Offset of field in record
              FType AS STRING * 1       -  Field type: "N","C","D","L","M"
              FLen AS INTEGER           -  Length of field in record
             END TYPE

    NdxNode.
------------
    This 20 byte data structure is the heart of the HDX file.  5 Long Integers
provide orientation, position, direction and location for itself and the 
record number for it's info in the DBF file.  This is necessary since nothing
can be accomplished without knowing exactly where you are inside an HDX file 
at any given time.

    Recno -  Position of record within it's DBF file.  The obvious problem
             here is that a PACKING procedure on a DBF file would necessitate
             realignment of all records in an index under this structure. An
             alternative would be a seperate maintenance facility of records
             in a DBF file to use empty record positions after copying deleted
             record info to a safe area and avoid packing altogether.
    Prnt  -  Points to the index entry's PARENT position within the HDX file.
             The Parent is an NdxNode which was previously inserted and based
             on a comparison criteria, accepts the current NdxNode's entry and
             assigns or passes it's position further down the tree.
    --------------------------------------------------------------------------
             In a BALANCED tree, all but the last generation of nodes utilize
             these variables for something other than NULL storage; but, there
             is no way around the fact that a significant amount of Dwords in
             the HDX file are used simply for maintaining NULL pointers in a 
             PURE binary tree scheme.

    LChld -  Points to an NDXNode's Less-Than/Greater-Than SON depending on 
             comparison criteria.
    RChld -  Points to an NDXNode's Less-Than/Greater-Than DAUGHTER depending 
             on comparison criteria.
    --------------------------------------------------------------------------
    Addr  -  All NdxNode's have their file offset stored within themselves.
             Although this sounds redundant it makes traversal, maintenance,
             and all other operations easier and faster.

    StkType.
------------
    Sizing this data structure to match the size of an NdxNode was crucial to
    its design. Still in all, using a Stack Element which holds upto 3 empty
    positions in an HDX file before giving itself up for use was not all that
    difficult a task to implement.

     Addr1,2,3 - Holds addresses of empty positions in HDX file. (Once again
                 NULL pointers will waste space, but no more so than had the
                 address held a normal NdxNode)
     Nxt       - Holds address of next Stack node in the linked list and
                 is NULL until the Node is full and another position has
                 been freed.  This being the case, the new position is 
                 converted to a Stack Element, it's address provided for
                 the previous last entry in stack and placed here, and the
                 new position's Addr1,2,3 are NULLED.
     Addr      - Stack element's offset inside the HDX file.

------------------------------------------------------------------------------
Dbase DBF Structures
------------------------------------------------------------------------------
     For interpretation of these data structures, I refer you the the afore-
mentioned publication by Ethan Winer, my code, and any other reference mate-
rial you can find. The structures and code have been modified slightly from 
Mr. Winer's description to accomodate my indexing implementation and are 
better off left as is (same applies to the code) unless you are well acquain-
ted with DBASE file structures.

TYPE DBFHeadStruc                           TYPE FieldStruc
  Version  AS INTEGER                        FName AS STRING * 10   
  Memo     AS INTEGER                        FType AS STRING * 1
  Ano      AS INTEGER                        FOff  AS INTEGER
  Mes      AS INTEGER                        FLen  AS INTEGER   
  Dia      AS INTEGER                        Dec   AS INTEGER
  FirstRec AS INTEGER                       END TYPE    
  TRecs    AS LONG
  RecLen   AS INTEGER
  TFields  AS INTEGER
END TYPE
------------------------------------------------------------------------------
------------------------------------------------------------------------------
This structure utilizes the DBF data structures above to define the DBF file
being worked with an associated index:

TYPE DBFileType
 FName AS STRING * 8         - Same as NdxHeader's DName.
 Status AS INTEGER           - In single user scheme; holds file handle.
 FileNo AS INTEGER           - In multi-user scheme; holds file handle.
 Fields(100) AS FieldStruc   - Limits the number of fields, but is necessary. 
 Header AS DBFHeadStruc      - See above.   
END TYPE
------------------------------------------------------------------------------
------------------------------------------------------------------------------
 I have included the following files for you to ponder over:

 NDXPACK.BI  - Include file for the whole "shibang"
 DBFNDX.MAK  - Project file for the code
 DBASE.BAS   - Dbase file operations
 NDXPACK.BAS - Hdx & file & Index operations
 NEXUS.BAS   - An interface for working with the recursive Traverse routines.
 FORM1.FRM   - A demo form to accept records from the Traverse routines.

 EXAM1,2.BAS - Sample code for applying the modules above once they have
                 become converted into a QUICK LIBRARY.

    As for the rest, I leave you to my code which is copiously filled with
remarks for you to sneer and laugh at. Seriously though, I would appreciate
any suggestions or ideas you may have to better this scheme.

    All of this is condensed into the working demo zip file NDXDBF.EXE found
in the same forum that this came from, and from which I got a mediocre res-
ponse.  I suppose since the ISAM/Access database development kits & tools
offer a quicker and easier method than this madness, it was to be expected,
yet as near as I can tell, the ability to provide network access and control
to the developer for SINGLE RECORDS under these very capable development 
tools is either not available or eludes me.

    If you should find all this code and information useful and/or to your
liking, please show your appreciation by sending $25.00 to:

                            J. Lerma Jr.
                            2413 McClelland
                            Laredo, Tx. 78040


    Include another $5.00 and I will send you my source code for implementing
this scheme to work for network/multi-user VBMSDOS applications.  Please in-
clude disk preference with your renumeration.

    
    So, you say, why should I spend my hard-earned money on your ideas? 
    Fair enough. Here come's the hard sell.

    I have developed a scheme for using this implementation in a network
environment with the following capabilities:

    1. Single record control for locking/unlocking
    2. Ability to use Soft/Hard Record & File Locks
    3. Multi-user index file capability
    4. Reprocess settings for poll busy instances of records
    5. Error Messages for index/record polling failures & file corruption
    6. Built in index update capability for poll out busy instances on index
         files which a system operator can retrieve to update the files before
         they become corrupt
    7. Hdx & Dbf File flag clearing routines for system crashes/failures.
    
    & MORE!

    This implementation is designed to do without record managers, drivers,
and devices, providing all the tools necessary, through the HDX & DBF file
structures and the modules, in a small enough package to place directly into 
your application and still leave room for your code.  Ease and flexibility
makes this network version generic enough to adapt from application to appli-
cation. Try it you'll like it!

To reiterate the capabilities of this scheme:

Work with DBF Files with up to 100 fields!
OpenDbf let's you pass File Handles to the modules which you control!
Search for fields by name or number!
Returns a field's DBASE data type!
Returns date values in proper BASIC format! (mm-dd-yyyy)
Saves date values in proper DBASE format! (yyyymmdd)
Checks for NULL Date fields!
Double precision capability for numeric fields!
Stores Numeric fields according to DBASE Field DEC format!

****************************************************************************
****************************************************************************

The INDEX CONSTRUCTOR and accompanying interface will allow you to
create index files with BTREE indexes on up to 10 KEY field values 
which you can TRAVERSE in ascending or descending order with a file
overhead of 20 bytes per record!

Store several indexes for each DBF file in one index file or group
indexes for several DBF files in one Index file!

With multiple index capability per file, up to 99 indexes on as many
DBF files can be active with only one file open!

Built in maintenance capability keeps track of unused file space to
prevent bloating of index files!

Update your indexes rapidly with one module call and an average search 
time of approximately 20 seeks for 100,000 records (tested)!

Index overhead is fixed at maximum of 10.9K bytes for a full index file
and 20 bytes per record regardless of DBF file or record/field length!

Constructor interface assumes DBF Records being in index order and
assures balanced tree formation for existing DBF files!

Traverse routine checks for BTree linear tendency and returns caution
through an error code to help prevent stack space errors!